Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Acoustic word embedding model based on Bi-LSTM and convolutional-Transformer

Yunyun GAO, Lasheng ZHAO, Qiang ZHANG

Journal of Computer Applications 2024, 44 (1): 123-128. DOI: 10.11772/j.issn.1001-9081.2023010062

Abstract （200）

HTML （7）

PDF （1311KB）（91）

Save

In Query-by-Example Spoken Term Detection （QbE-STD）， the Acoustic Word Embedding （AWE） speech information extracted by Convolutional Neural Network （CNN） or Recurrent Neural Network （RNN） is limited. To better represent speech content and improve model performance， an acoustic word embedding model based on Bi-directional Long Short-Term Memory （Bi-LSTM） and convolutional-Transformer was proposed. Firstly， Bi-LSTM was utilized for extracting features， modeling speech sequences and improving the model learning ability by superposition. Secondly， to learn local information while capturing global information， CNN and Transformer encoder were connected in parallel to form convolutional-Transformer， which taking full advantages in feature extraction to aggregate more efficient information and improving the discrimination of embeddings. Under the constraint of contrast loss， the Average Precision （AP） of the proposed model reaches 94.36%， which is 1.76% higher than that of the Bi-LSTM model based on attention. The experimental results show that the proposed model can effectively improve model performance and better perform QbE-STD.

Table and Figures | Reference | Related Articles | Metrics